Transfer learning increasingly becomes an important tool in handling data scarcity often encountered in machine learning. In the application of high-throughput thickness as a downstream process of the high-throughput optimization of optoelectronic thin films with autonomous workflows, data scarcity occurs especially for new materials. To achieve high-throughput thickness characterization, we propose a machine learning model called thicknessML that predicts thickness from UV-Vis spectrophotometry input and an overarching transfer learning workflow. We demonstrate the transfer learning workflow from generic source domain of generic band-gapped materials to specific target domain of perovskite materials, where the target domain data only come from limited number (18) of refractive indices from literature. The target domain can be easily extended to other material classes with a few literature data. Defining thickness prediction accuracy to be within-10% deviation, thicknessML achieves 92.2% (with a deviation of 3.6%) accuracy with transfer learning compared to 81.8% (with a deviation of 3.6%) 11.7% without (lower mean and larger standard deviation). Experimental validation on six deposited perovskite films also corroborates the efficacy of the proposed workflow by yielding a 10.5% mean absolute percentage error (MAPE).
translated by 谷歌翻译
实现一般逆设计可以通过用户定义的属性极大地加速对新材料的发现。然而,最先进的生成模型往往限于特定的组成或晶体结构。这里,我们提出了一种能够一般逆设计的框架(不限于给定的一组元件或晶体结构),其具有在实际和往复空间中编码晶体的广义可逆表示,以及来自变分的属性结构潜空间autoencoder(vae)。在三种设计情况下,该框架通过用户定义的形成能量,带隙,热电(TE)功率因数和组合产生142个新晶体。在训练数据库中缺席的这些生成的晶体通过第一原理计算验证。成功率(验证的第一原理验证的目标圆形晶体/数量的设计晶体)范围为7.1%和38.9%。这些结果表示利用生成模型朝着性质驱动的一般逆设计的重要步骤,尽管在与实验合成结合时仍然存在实际挑战。
translated by 谷歌翻译
通过使用立体声相机或3D摄像机估计深度图像,确定场景中的对象和来自2D图像的相机传感器之间的距离。深度估计的结果是相对距离,可用于计算实际上适用的绝对距离。然而,距离估计非常具有挑战性,使用2D单手套相机。本文介绍了深度学习框架,由两个深度网络组成,用于使用单个图像进行深度估计和对象检测。首先,使用您只有一次(yolov5)网络,检测和本地化场景中的对象。并行地,使用深度自动统计器网络计算估计的深度图像以检测相对距离。基于对象检测的基于对象的Yolo使用监督学习技术训练,又逆转,深度估计网络是自我监督的培训。呈现距离估计框架是在室外场景的真实图像上进行评估。所达到的结果表明,该框架具有前景,其含量为96%,RMSE为0.203的正确绝对距离。
translated by 谷歌翻译
While the capabilities of autonomous systems have been steadily improving in recent years, these systems still struggle to rapidly explore previously unknown environments without the aid of GPS-assisted navigation. The DARPA Subterranean (SubT) Challenge aimed to fast track the development of autonomous exploration systems by evaluating their performance in real-world underground search-and-rescue scenarios. Subterranean environments present a plethora of challenges for robotic systems, such as limited communications, complex topology, visually-degraded sensing, and harsh terrain. The presented solution enables long-term autonomy with minimal human supervision by combining a powerful and independent single-agent autonomy stack, with higher level mission management operating over a flexible mesh network. The autonomy suite deployed on quadruped and wheeled robots was fully independent, freeing the human supervision to loosely supervise the mission and make high-impact strategic decisions. We also discuss lessons learned from fielding our system at the SubT Final Event, relating to vehicle versatility, system adaptability, and re-configurable communications.
translated by 谷歌翻译
Attention mechanisms form a core component of several successful deep learning architectures, and are based on one key idea: ''The output depends only on a small (but unknown) segment of the input.'' In several practical applications like image captioning and language translation, this is mostly true. In trained models with an attention mechanism, the outputs of an intermediate module that encodes the segment of input responsible for the output is often used as a way to peek into the `reasoning` of the network. We make such a notion more precise for a variant of the classification problem that we term selective dependence classification (SDC) when used with attention model architectures. Under such a setting, we demonstrate various error modes where an attention model can be accurate but fail to be interpretable, and show that such models do occur as a result of training. We illustrate various situations that can accentuate and mitigate this behaviour. Finally, we use our objective definition of interpretability for SDC tasks to evaluate a few attention model learning algorithms designed to encourage sparsity and demonstrate that these algorithms help improve interpretability.
translated by 谷歌翻译
Recent advances in deep learning have enabled us to address the curse of dimensionality (COD) by solving problems in higher dimensions. A subset of such approaches of addressing the COD has led us to solving high-dimensional PDEs. This has resulted in opening doors to solving a variety of real-world problems ranging from mathematical finance to stochastic control for industrial applications. Although feasible, these deep learning methods are still constrained by training time and memory. Tackling these shortcomings, Tensor Neural Networks (TNN) demonstrate that they can provide significant parameter savings while attaining the same accuracy as compared to the classical Dense Neural Network (DNN). In addition, we also show how TNN can be trained faster than DNN for the same accuracy. Besides TNN, we also introduce Tensor Network Initializer (TNN Init), a weight initialization scheme that leads to faster convergence with smaller variance for an equivalent parameter count as compared to a DNN. We benchmark TNN and TNN Init by applying them to solve the parabolic PDE associated with the Heston model, which is widely used in financial pricing theory.
translated by 谷歌翻译
Artificial neural networks can learn complex, salient data features to achieve a given task. On the opposite end of the spectrum, mathematically grounded methods such as topological data analysis allow users to design analysis pipelines fully aware of data constraints and symmetries. We introduce a class of persistence-based neural network layers. Persistence-based layers allow the users to easily inject knowledge about symmetries (equivariance) respected by the data, are equipped with learnable weights, and can be composed with state-of-the-art neural architectures.
translated by 谷歌翻译
State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on the synthetic languages and comes within 0.4 PPL of Transformers on OpenWebText. Furthermore, a hybrid 125M-parameter H3-attention model that retains two attention layers surprisingly outperforms Transformers on OpenWebText by 1.0 PPL. Next, to improve the efficiency of training SSMs on modern hardware, we propose FlashConv. FlashConv uses a fused block FFT algorithm to improve efficiency on sequences up to 8K, and introduces a novel state passing algorithm that exploits the recurrent properties of SSMs to scale to longer sequences. FlashConv yields 2$\times$ speedup on the long-range arena benchmark and allows hybrid language models to generate text 1.6$\times$ faster than Transformers. Using FlashConv, we scale hybrid H3-attention language models up to 1.3B parameters on the Pile and find promising initial results, achieving lower perplexity than Transformers and outperforming Transformers in zero- and few-shot learning on a majority of tasks in the SuperGLUE benchmark.
translated by 谷歌翻译
KL-regularized reinforcement learning from expert demonstrations has proved successful in improving the sample efficiency of deep reinforcement learning algorithms, allowing them to be applied to challenging physical real-world tasks. However, we show that KL-regularized reinforcement learning with behavioral reference policies derived from expert demonstrations can suffer from pathological training dynamics that can lead to slow, unstable, and suboptimal online learning. We show empirically that the pathology occurs for commonly chosen behavioral policy classes and demonstrate its impact on sample efficiency and online policy performance. Finally, we show that the pathology can be remedied by non-parametric behavioral reference policies and that this allows KL-regularized reinforcement learning to significantly outperform state-of-the-art approaches on a variety of challenging locomotion and dexterous hand manipulation tasks.
translated by 谷歌翻译
Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.
translated by 谷歌翻译